Dataset statistics
| Number of variables | 17 |
|---|---|
| Number of observations | 888228 |
| Missing cells | 3452445 |
| Missing cells (%) | 22.9% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 579.2 MiB |
| Average record size in memory | 683.7 B |
Variable types
| CAT | 11 |
|---|---|
| NUM | 6 |
Reproduction
| Analysis started | 2020-02-26 02:53:54.218969 |
|---|---|
| Analysis finished | 2020-02-26 05:05:20.829865 |
| Version | pandas-profiling v2.5.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
model has a high cardinality: 954 distinct values | High cardinality |
stk_year has a high cardinality: 113 distinct values | High cardinality |
date_created has a high cardinality: 888228 distinct values | High cardinality |
date_last_seen has a high cardinality: 839005 distinct values | High cardinality |
maker has 129461 (14.6%) missing values | Missing |
model has 283265 (31.9%) missing values | Missing |
mileage has 90469 (10.2%) missing values | Missing |
manufacture_year has 92350 (10.4%) missing values | Missing |
engine_displacement has 185795 (20.9%) missing values | Missing |
engine_power has 138657 (15.6%) missing values | Missing |
body_type has 280220 (31.5%) missing values | Missing |
color_slug has 835604 (94.1%) missing values | Missing |
stk_year has 427803 (48.2%) missing values | Missing |
transmission has 185429 (20.9%) missing values | Missing |
door_count has 153884 (17.3%) missing values | Missing |
seat_count has 187285 (21.1%) missing values | Missing |
fuel_type has 462223 (52.0%) missing values | Missing |
price_eur is highly skewed (γ1 = 942.4582842) | Skewed |
date_created only contains datetime values, but is categorical. Consider applying pd.to_datetime() | Type |
date_last_seen only contains datetime values, but is categorical. Consider applying pd.to_datetime() | Type |
mileage has 40475 (4.6%) zeros | Zeros |
| Distinct count | 888228 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1778744.5171070942 |
|---|---|
| Minimum | 1 |
| Maximum | 3552909 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 178409.05 |
| Q1 | 889836.75 |
| median | 1779182 |
| Q3 | 2667137.5 |
| 95-th percentile | 3377288.95 |
| Maximum | 3552909 |
| Range | 3552908 |
| Interquartile range (IQR) | 1777300.75 |
Descriptive statistics
| Standard deviation | 1026197.063 |
|---|---|
| Coefficient of variation (CV) | 0.5769221228 |
| Kurtosis | -1.200018129 |
| Mean | 1778744.517 |
| Median Absolute Deviation (MAD) | 888793.3012 |
| Skewness | -0.002733294476 |
| Sum | 1.579930685e+12 |
| Variance | 1.053080412e+12 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.000000e+00 3.552909e+06], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 3474068 | 1 | < 0.1% | |
| 1571751 | 1 | < 0.1% | |
| 1565604 | 1 | < 0.1% | |
| 1457055 | 1 | < 0.1% | |
| 3548060 | 1 | < 0.1% | |
| 1446810 | 1 | < 0.1% | |
| 3541913 | 1 | < 0.1% | |
| 1442712 | 1 | < 0.1% | |
| 2688262 | 1 | < 0.1% | |
| Other values (888218) | 888218 | > 99.9% |
| Value | Count | Frequency (%) | |
| 1 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 7 | 1 | < 0.1% | |
| 10 | 1 | < 0.1% | |
| 19 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 3552909 | 1 | < 0.1% | |
| 3552905 | 1 | < 0.1% | |
| 3552902 | 1 | < 0.1% | |
| 3552900 | 1 | < 0.1% | |
| 3552892 | 1 | < 0.1% |
| Distinct count | 46 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 129461 |
| Missing (%) | 14.6% |
| Memory size | 6.8 MiB |
| skoda | 78681 |
|---|---|
| volkswagen | 74582 |
| bmw | 66795 |
| mercedes-benz | 63040 |
| audi | 61721 |
| Other values (41) |
| Value | Count | Frequency (%) | |
| skoda | 78681 | 8.9% | |
| volkswagen | 74582 | 8.4% | |
| bmw | 66795 | 7.5% | |
| mercedes-benz | 63040 | 7.1% | |
| audi | 61721 | 6.9% | |
| ford | 59858 | 6.7% | |
| opel | 54218 | 6.1% | |
| fiat | 32957 | 3.7% | |
| citroen | 30514 | 3.4% | |
| renault | 26505 | 3.0% | |
| Other values (36) | 209896 | 23.6% | |
| (Missing) | 129461 | 14.6% |
Length
| Max length | 13 |
|---|---|
| Mean length | 5.630189546 |
| Min length | 3 |
| Value | Count | Frequency (%) | |
| Lowercase_Letter | 25 | 96.2% | |
| Dash_Punctuation | 1 | 3.8% |
| Value | Count | Frequency (%) | |
| Latin | 25 | 96.2% | |
| Common | 1 | 3.8% |
| Value | Count | Frequency (%) | |
| ASCII | 26 | 100.0% |
| Distinct count | 954 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 283265 |
| Missing (%) | 31.9% |
| Memory size | 6.8 MiB |
| octavia | 32476 |
|---|---|
| fabia | 23024 |
| golf | 22855 |
| focus | 15124 |
| astra | 14434 |
| Other values (949) |
| Value | Count | Frequency (%) | |
| octavia | 32476 | 3.7% | |
| fabia | 23024 | 2.6% | |
| golf | 22855 | 2.6% | |
| focus | 15124 | 1.7% | |
| astra | 14434 | 1.6% | |
| passat | 12775 | 1.4% | |
| a3 | 12645 | 1.4% | |
| corsa | 11619 | 1.3% | |
| fiesta | 8794 | 1.0% | |
| polo | 8244 | 0.9% | |
| Other values (944) | 442973 | 49.9% | |
| (Missing) | 283265 | 31.9% |
Length
| Max length | 23 |
|---|---|
| Mean length | 4.41355035 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Lowercase_Letter | 26 | 70.3% | |
| Decimal_Number | 10 | 27.0% | |
| Dash_Punctuation | 1 | 2.7% |
| Value | Count | Frequency (%) | |
| Latin | 26 | 70.3% | |
| Common | 11 | 29.7% |
| Value | Count | Frequency (%) | |
| ASCII | 37 | 100.0% |
| Distinct count | 144550 |
|---|---|
| Unique (%) | 18.1% |
| Missing | 90469 |
| Missing (%) | 10.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 115906.30837759274 |
|---|---|
| Minimum | 0.0 |
| Maximum | 9999999.0 |
| Zeros | 40475 |
| Zeros (%) | 4.6% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 18742 |
| median | 86376 |
| Q3 | 158408.5 |
| 95-th percentile | 255000 |
| Maximum | 9999999 |
| Range | 9999999 |
| Interquartile range (IQR) | 139666.5 |
Descriptive statistics
| Standard deviation | 344693.7095 |
|---|---|
| Coefficient of variation (CV) | 2.973899474 |
| Kurtosis | 433.8286873 |
| Mean | 115906.3084 |
| Median Absolute Deviation (MAD) | 92146.77003 |
| Skewness | 19.49107601 |
| Sum | 9.246530066e+10 |
| Variance | 1.188137534e+11 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 40475 | 4.6% | |
| 10 | 26831 | 3.0% | |
| 1 | 8693 | 1.0% | |
| 100 | 6533 | 0.7% | |
| 5 | 5315 | 0.6% | |
| 150000 | 3556 | 0.4% | |
| 200000 | 3063 | 0.3% | |
| 15 | 3061 | 0.3% | |
| 160000 | 2867 | 0.3% | |
| 170000 | 2755 | 0.3% | |
| Other values (144540) | 694610 | 78.2% | |
| (Missing) | 90469 | 10.2% |
| Value | Count | Frequency (%) | |
| 0 | 40475 | 4.6% | |
| 1 | 8693 | 1.0% | |
| 2 | 1826 | 0.2% | |
| 3 | 663 | 0.1% | |
| 4 | 414 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9999999 | 38 | < 0.1% | |
| 9996083 | 2 | < 0.1% | |
| 9991981 | 1 | < 0.1% | |
| 9983000 | 1 | < 0.1% | |
| 9981655 | 1 | < 0.1% |
| Distinct count | 1117 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 92350 |
| Missing (%) | 10.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2000.8088727166726 |
|---|---|
| Minimum | 0.0 |
| Maximum | 2017.0 |
| Zeros | 29 |
| Zeros (%) | < 0.1% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1997 |
| Q1 | 2004 |
| median | 2009 |
| Q3 | 2013 |
| 95-th percentile | 2015 |
| Maximum | 2017 |
| Range | 2017 |
| Interquartile range (IQR) | 9 |
Descriptive statistics
| Standard deviation | 82.39924251 |
|---|---|
| Coefficient of variation (CV) | 0.04118296537 |
| Kurtosis | 243.7447351 |
| Mean | 2000.808873 |
| Median Absolute Deviation (MAD) | 15.6599588 |
| Skewness | -14.24606305 |
| Sum | 1592399764 |
| Variance | 6789.635166 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2015 | 109960 | 12.4% | |
| 2012 | 61326 | 6.9% | |
| 2011 | 54899 | 6.2% | |
| 2014 | 50621 | 5.7% | |
| 2013 | 41367 | 4.7% | |
| 2007 | 39644 | 4.5% | |
| 2010 | 39388 | 4.4% | |
| 2008 | 38751 | 4.4% | |
| 2006 | 38594 | 4.3% | |
| 2009 | 36391 | 4.1% | |
| Other values (1107) | 284937 | 32.1% | |
| (Missing) | 92350 | 10.4% |
| Value | Count | Frequency (%) | |
| 0 | 29 | < 0.1% | |
| 1 | 3 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 6 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2017 | 2764 | 0.3% | |
| 2016 | 31033 | 3.5% | |
| 2015 | 109960 | 12.4% | |
| 2014 | 50621 | 5.7% | |
| 2013 | 41367 | 4.7% |
| Distinct count | 4456 |
|---|---|
| Unique (%) | 0.6% |
| Missing | 185795 |
| Missing (%) | 20.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2042.765530662711 |
|---|---|
| Minimum | 0.0 |
| Maximum | 32000.0 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1000 |
| Q1 | 1400 |
| median | 1798 |
| Q3 | 1997 |
| 95-th percentile | 3189 |
| Maximum | 32000 |
| Range | 32000 |
| Interquartile range (IQR) | 597 |
Descriptive statistics
| Standard deviation | 1961.148841 |
|---|---|
| Coefficient of variation (CV) | 0.9600459826 |
| Kurtosis | 120.9708446 |
| Mean | 2042.765531 |
| Median Absolute Deviation (MAD) | 695.0110645 |
| Skewness | 9.837364222 |
| Sum | 1434905920 |
| Variance | 3846104.777 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1968 | 53735 | 6.0% | |
| 1598 | 52525 | 5.9% | |
| 1995 | 28161 | 3.2% | |
| 1560 | 19442 | 2.2% | |
| 1197 | 19042 | 2.1% | |
| 1900 | 18096 | 2.0% | |
| 2000 | 17310 | 1.9% | |
| 1896 | 17126 | 1.9% | |
| 1390 | 16321 | 1.8% | |
| 1997 | 15367 | 1.7% | |
| Other values (4446) | 445308 | 50.1% | |
| (Missing) | 185795 | 20.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 10 | 17 | < 0.1% | |
| 12 | 4 | < 0.1% | |
| 13 | 3 | < 0.1% | |
| 14 | 3 | < 0.1% |
| Value | Count | Frequency (%) | |
| 32000 | 200 | < 0.1% | |
| 31999 | 1 | < 0.1% | |
| 31987 | 1 | < 0.1% | |
| 31968 | 5 | < 0.1% | |
| 31966 | 2 | < 0.1% |
| Distinct count | 532 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 138657 |
| Missing (%) | 15.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 98.44853789701043 |
|---|---|
| Minimum | 1.0 |
| Maximum | 999.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 48 |
| Q1 | 68 |
| median | 86 |
| Q3 | 110 |
| 95-th percentile | 184 |
| Maximum | 999 |
| Range | 998 |
| Interquartile range (IQR) | 42 |
Descriptive statistics
| Standard deviation | 49.01260876 |
|---|---|
| Coefficient of variation (CV) | 0.4978500423 |
| Kurtosis | 14.79790505 |
| Mean | 98.4485379 |
| Median Absolute Deviation (MAD) | 32.98080479 |
| Skewness | 2.762575673 |
| Sum | 73794169 |
| Variance | 2402.235817 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 103 | 41665 | 4.7% | |
| 110 | 40716 | 4.6% | |
| 77 | 38068 | 4.3% | |
| 66 | 36566 | 4.1% | |
| 55 | 31407 | 3.5% | |
| 81 | 30445 | 3.4% | |
| 85 | 28937 | 3.3% | |
| 74 | 23003 | 2.6% | |
| 125 | 20719 | 2.3% | |
| 100 | 19294 | 2.2% | |
| Other values (522) | 438751 | 49.4% | |
| (Missing) | 138657 | 15.6% |
| Value | Count | Frequency (%) | |
| 1 | 1 | < 0.1% | |
| 2 | 2 | < 0.1% | |
| 3 | 7 | < 0.1% | |
| 4 | 2 | < 0.1% | |
| 6 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 999 | 4 | < 0.1% | |
| 998 | 2 | < 0.1% | |
| 997 | 3 | < 0.1% | |
| 995 | 1 | < 0.1% | |
| 968 | 2 | < 0.1% |
| Distinct count | 9 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 280220 |
| Missing (%) | 31.5% |
| Memory size | 6.8 MiB |
| other | |
|---|---|
| compact | 60348 |
| coupe | 17858 |
| stationwagon | 17517 |
| van | 7813 |
| Other values (4) | 12887 |
| Value | Count | Frequency (%) | |
| other | 491585 | 55.3% | |
| compact | 60348 | 6.8% | |
| coupe | 17858 | 2.0% | |
| stationwagon | 17517 | 2.0% | |
| van | 7813 | 0.9% | |
| offroad | 5683 | 0.6% | |
| sedan | 4856 | 0.5% | |
| convertible | 1291 | 0.1% | |
| transporter | 1057 | 0.1% | |
| (Missing) | 280220 | 31.5% |
Length
| Max length | 12 |
|---|---|
| Mean length | 4.654033649 |
| Min length | 3 |
| Value | Count | Frequency (%) | |
| Lowercase_Letter | 20 | 100.0% |
| Value | Count | Frequency (%) | |
| Latin | 20 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 20 | 100.0% |
| Distinct count | 14 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 835604 |
| Missing (%) | 94.1% |
| Memory size | 6.8 MiB |
| black | |
|---|---|
| white | |
| blue | |
| silver | |
| red | |
| Other values (9) |
| Value | Count | Frequency (%) | |
| black | 10680 | 1.2% | |
| white | 10316 | 1.2% | |
| blue | 9437 | 1.1% | |
| silver | 8161 | 0.9% | |
| red | 5001 | 0.6% | |
| green | 2319 | 0.3% | |
| brown | 2252 | 0.3% | |
| grey | 1634 | 0.2% | |
| beige | 1052 | 0.1% | |
| yellow | 571 | 0.1% | |
| Other values (4) | 1201 | 0.1% | |
| (Missing) | 835604 | 94.1% |
Length
| Max length | 6 |
|---|---|
| Mean length | 3.104806424 |
| Min length | 3 |
| Value | Count | Frequency (%) | |
| Lowercase_Letter | 20 | 100.0% |
| Value | Count | Frequency (%) | |
| Latin | 20 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 20 | 100.0% |
| Distinct count | 113 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 427803 |
| Missing (%) | 48.2% |
| Memory size | 6.8 MiB |
| None | |
|---|---|
| 2018 | 45875 |
| 2017 | 45220 |
| 2016 | 31129 |
| 2019 | 11105 |
| Other values (108) | 664 |
| Value | Count | Frequency (%) | |
| None | 326432 | 36.8% | |
| 2018 | 45875 | 5.2% | |
| 2017 | 45220 | 5.1% | |
| 2016 | 31129 | 3.5% | |
| 2019 | 11105 | 1.3% | |
| 2015 | 212 | < 0.1% | |
| 2020 | 209 | < 0.1% | |
| 2021 | 26 | < 0.1% | |
| 3000 | 18 | < 0.1% | |
| 2500 | 16 | < 0.1% | |
| Other values (103) | 183 | < 0.1% | |
| (Missing) | 427803 | 48.2% |
Length
| Max length | 4 |
|---|---|
| Mean length | 3.518363528 |
| Min length | 3 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 10 | 66.7% | |
| Lowercase_Letter | 4 | 26.7% | |
| Uppercase_Letter | 1 | 6.7% |
| Value | Count | Frequency (%) | |
| Common | 10 | 66.7% | |
| Latin | 5 | 33.3% |
| Value | Count | Frequency (%) | |
| ASCII | 15 | 100.0% |
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 185429 |
| Missing (%) | 20.9% |
| Memory size | 6.8 MiB |
| man | |
|---|---|
| auto |
| Value | Count | Frequency (%) | |
| man | 505358 | 56.9% | |
| auto | 197441 | 22.2% | |
| (Missing) | 185429 | 20.9% |
Length
| Max length | 4 |
|---|---|
| Mean length | 3.222286395 |
| Min length | 3 |
| Value | Count | Frequency (%) | |
| Lowercase_Letter | 6 | 100.0% |
| Value | Count | Frequency (%) | |
| Latin | 6 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 6 | 100.0% |
| Distinct count | 14 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 153884 |
| Missing (%) | 17.3% |
| Memory size | 6.8 MiB |
| 4 | |
|---|---|
| 5 | |
| None | |
| 2 | |
| 3 | 30186 |
| Other values (9) | 2509 |
| Value | Count | Frequency (%) | |
| 4 | 282207 | 31.8% | |
| 5 | 224029 | 25.2% | |
| None | 118487 | 13.3% | |
| 2 | 76926 | 8.7% | |
| 3 | 30186 | 3.4% | |
| 0 | 2083 | 0.2% | |
| 6 | 324 | < 0.1% | |
| 1 | 84 | < 0.1% | |
| 7 | 10 | < 0.1% | |
| 55 | 4 | < 0.1% | |
| Other values (4) | 4 | < 0.1% | |
| (Missing) | 153884 | 17.3% |
Length
| Max length | 4 |
|---|---|
| Mean length | 1.746695668 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 9 | 64.3% | |
| Lowercase_Letter | 4 | 28.6% | |
| Uppercase_Letter | 1 | 7.1% |
| Value | Count | Frequency (%) | |
| Common | 9 | 64.3% | |
| Latin | 5 | 35.7% |
| Value | Count | Frequency (%) | |
| ASCII | 14 | 100.0% |
| Distinct count | 42 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 187285 |
| Missing (%) | 21.1% |
| Memory size | 6.8 MiB |
| 5 | |
|---|---|
| None | |
| 4 | 61116 |
| 7 | 24875 |
| 2 | 18086 |
| Other values (37) | 19993 |
| Value | Count | Frequency (%) | |
| 5 | 442656 | 49.8% | |
| None | 134217 | 15.1% | |
| 4 | 61116 | 6.9% | |
| 7 | 24875 | 2.8% | |
| 2 | 18086 | 2.0% | |
| 3 | 8312 | 0.9% | |
| 6 | 3574 | 0.4% | |
| 9 | 3121 | 0.4% | |
| 0 | 3027 | 0.3% | |
| 8 | 1730 | 0.2% | |
| Other values (32) | 229 | < 0.1% | |
| (Missing) | 187285 | 21.1% |
Length
| Max length | 4 |
|---|---|
| Mean length | 1.875124405 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 10 | 66.7% | |
| Lowercase_Letter | 4 | 26.7% | |
| Uppercase_Letter | 1 | 6.7% |
| Value | Count | Frequency (%) | |
| Common | 10 | 66.7% | |
| Latin | 5 | 33.3% |
| Value | Count | Frequency (%) | |
| ASCII | 15 | 100.0% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 462223 |
| Missing (%) | 52.0% |
| Memory size | 6.8 MiB |
| gasoline | |
|---|---|
| diesel | |
| electric | 6653 |
| lpg | 1844 |
| cng | 281 |
| Value | Count | Frequency (%) | |
| gasoline | 225760 | 25.4% | |
| diesel | 191467 | 21.6% | |
| electric | 6653 | 0.7% | |
| lpg | 1844 | 0.2% | |
| cng | 281 | < 0.1% | |
| (Missing) | 462223 | 52.0% |
Length
| Max length | 8 |
|---|---|
| Mean length | 4.954977776 |
| Min length | 3 |
| Value | Count | Frequency (%) | |
| Lowercase_Letter | 13 | 100.0% |
| Value | Count | Frequency (%) | |
| Latin | 13 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 13 | 100.0% |
| Distinct count | 888228 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.8 MiB |
| 2016-01-18 07:17:32.222776+00 | 1 |
|---|---|
| 2016-09-24 18:10:16.125849+00 | 1 |
| 2015-12-10 10:22:18.042673+00 | 1 |
| 2016-06-24 18:16:55.069692+00 | 1 |
| 2016-02-18 04:43:19.77417+00 | 1 |
| Other values (888223) |
| Value | Count | Frequency (%) | |
| 2016-01-18 07:17:32.222776+00 | 1 | < 0.1% | |
| 2016-09-24 18:10:16.125849+00 | 1 | < 0.1% | |
| 2015-12-10 10:22:18.042673+00 | 1 | < 0.1% | |
| 2016-06-24 18:16:55.069692+00 | 1 | < 0.1% | |
| 2016-02-18 04:43:19.77417+00 | 1 | < 0.1% | |
| 2016-01-11 02:45:08.422225+00 | 1 | < 0.1% | |
| 2016-02-26 11:02:40.707129+00 | 1 | < 0.1% | |
| 2016-02-25 02:00:10.44408+00 | 1 | < 0.1% | |
| 2016-02-17 20:17:26.904174+00 | 1 | < 0.1% | |
| 2016-02-23 06:28:47.4043+00 | 1 | < 0.1% | |
| Other values (888218) | 888218 | > 99.9% |
Length
| Max length | 29 |
|---|---|
| Mean length | 28.88901273 |
| Min length | 22 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 10 | 66.7% | |
| Other_Punctuation | 2 | 13.3% | |
| Space_Separator | 1 | 6.7% | |
| Math_Symbol | 1 | 6.7% | |
| Dash_Punctuation | 1 | 6.7% |
| Value | Count | Frequency (%) | |
| Common | 15 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 15 | 100.0% |
| Distinct count | 839005 |
|---|---|
| Unique (%) | 94.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.8 MiB |
| 2016-01-27 20:40:15.46361+00 | 49224 |
|---|---|
| 2016-07-03 18:15:27.873481+00 | 1 |
| 2017-03-16 01:12:36.938189+00 | 1 |
| 2016-07-03 17:43:07.532685+00 | 1 |
| 2016-07-03 17:25:41.771938+00 | 1 |
| Other values (839000) |
| Value | Count | Frequency (%) | |
| 2016-01-27 20:40:15.46361+00 | 49224 | 5.5% | |
| 2016-07-03 18:15:27.873481+00 | 1 | < 0.1% | |
| 2017-03-16 01:12:36.938189+00 | 1 | < 0.1% | |
| 2016-07-03 17:43:07.532685+00 | 1 | < 0.1% | |
| 2016-07-03 17:25:41.771938+00 | 1 | < 0.1% | |
| 2017-01-27 09:14:33.037769+00 | 1 | < 0.1% | |
| 2016-02-10 20:36:29.294882+00 | 1 | < 0.1% | |
| 2015-12-16 19:46:54.735655+00 | 1 | < 0.1% | |
| 2016-07-03 18:20:36.835577+00 | 1 | < 0.1% | |
| 2016-02-11 08:54:42.891795+00 | 1 | < 0.1% | |
| Other values (838995) | 838995 | 94.5% |
Length
| Max length | 29 |
|---|---|
| Mean length | 28.83996789 |
| Min length | 22 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 10 | 66.7% | |
| Other_Punctuation | 2 | 13.3% | |
| Space_Separator | 1 | 6.7% | |
| Math_Symbol | 1 | 6.7% | |
| Dash_Punctuation | 1 | 6.7% |
| Value | Count | Frequency (%) | |
| Common | 15 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 15 | 100.0% |
| Distinct count | 112039 |
|---|---|
| Unique (%) | 12.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3062208.4001452764 |
|---|---|
| Minimum | 0.04 |
| Maximum | 2706149053064.4 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 0.04 |
|---|---|
| 5-th percentile | 1073.28 |
| Q1 | 1295.34 |
| median | 7327.91 |
| Q3 | 16281.79 |
| 95-th percentile | 34848.916 |
| Maximum | 2.706149053e+12 |
| Range | 2.706149053e+12 |
| Interquartile range (IQR) | 14986.45 |
Descriptive statistics
| Standard deviation | 2871372340 |
|---|---|
| Coefficient of variation (CV) | 937.6802507 |
| Kurtosis | 888227.745 |
| Mean | 3062208.4 |
| Median Absolute Deviation (MAD) | 6100290.459 |
| Skewness | 942.4582842 |
| Sum | 2.719939243e+12 |
| Variance | 8.244779116e+18 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[4.00000000e-02 4.50000000e-02 6.50000000e-02 7.50000000e-02 1.15000000e-01 ... 2.69401098e+07 2.69503514e+07 2.87679968e+07 1.28878612e+08 2.70614905e+12], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 1295.34 | 168938 | 19.0% | |
| 9900 | 1643 | 0.2% | |
| 10900 | 1616 | 0.2% | |
| 11900 | 1574 | 0.2% | |
| 12900 | 1546 | 0.2% | |
| 8900 | 1489 | 0.2% | |
| 13900 | 1411 | 0.2% | |
| 14900 | 1402 | 0.2% | |
| 3500 | 1400 | 0.2% | |
| 6900 | 1398 | 0.2% | |
| Other values (112029) | 705811 | 79.5% |
| Value | Count | Frequency (%) | |
| 0.04 | 788 | 0.1% | |
| 0.05 | 29 | < 0.1% | |
| 0.06 | 41 | < 0.1% | |
| 0.07 | 117 | < 0.1% | |
| 0.08 | 16 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2.706149053e+12 | 1 | < 0.1% | |
| 971219350 | 1 | < 0.1% | |
| 157668401.9 | 1 | < 0.1% | |
| 100088822.1 | 1 | < 0.1% | |
| 100003700 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
First rows
| df_index | maker | model | mileage | manufacture_year | engine_displacement | engine_power | body_type | color_slug | stk_year | transmission | door_count | seat_count | fuel_type | date_created | date_last_seen | price_eur | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2282567 | ford | fiesta | 129000.0 | 2009.0 | NaN | 88.0 | other | NaN | NaN | man | NaN | NaN | NaN | 2016-03-01 04:22:44.466271+00 | 2016-07-03 17:16:27.50003+00 | 6900.00 |
| 1 | 845119 | seat | leon | NaN | 2004.0 | 4245.0 | NaN | compact | NaN | None | NaN | None | None | gasoline | 2015-12-22 06:54:54.986737+00 | 2016-01-07 10:08:16.993797+00 | 3886.01 |
| 2 | 645106 | smart | fortwo | 99985.0 | 2004.0 | 698.0 | 37.0 | NaN | NaN | None | auto | 2 | 2 | gasoline | 2015-12-12 17:23:52.693186+00 | 2016-02-10 20:13:12.406986+00 | 2000.22 |
| 3 | 2468530 | ford | focus | 212020.0 | 2002.0 | 1753.0 | 66.0 | other | NaN | NaN | man | 5 | 5 | NaN | 2016-03-04 22:05:51.891484+00 | 2016-07-03 17:34:54.394008+00 | 1600.00 |
| 4 | 3478350 | volkswagen | golf | 180000.0 | 2003.0 | 1900.0 | 74.0 | other | NaN | NaN | man | NaN | NaN | NaN | 2017-02-28 18:34:33.090724+00 | 2017-03-06 01:07:20.215485+00 | 1295.34 |
| 5 | 1808645 | mercedes-benz | NaN | 188000.0 | 2002.0 | NaN | 105.0 | other | NaN | NaN | man | 4 | 5 | NaN | 2016-02-19 02:06:25.780919+00 | 2016-07-03 18:31:35.520083+00 | 2199.81 |
| 6 | 1135859 | fiat | grande-punto | 85613.0 | 2007.0 | 1248.0 | 55.0 | NaN | black | None | man | 3 | 5 | diesel | 2016-01-09 12:31:01.980036+00 | 2016-07-03 17:02:41.352873+00 | 3800.00 |
| 7 | 589975 | NaN | NaN | 36500.0 | 2013.0 | 1560.0 | 84.0 | NaN | NaN | None | man | 4 | 5 | diesel | 2015-12-10 08:46:37.466293+00 | 2016-01-24 20:50:30.634275+00 | 12176.46 |
| 8 | 3522373 | volvo | s40 | 187192.0 | 2004.0 | 2435.0 | 125.0 | sedan | blue | NaN | NaN | 4 | 5 | gasoline | 2017-03-10 16:11:27.006883+00 | 2017-03-10 16:11:27.006883+00 | 1295.34 |
| 9 | 2167119 | renault | megane | 116000.0 | 2011.0 | 1461.0 | 81.0 | other | NaN | 2018 | auto | 5 | 5 | NaN | 2016-02-27 02:17:25.21999+00 | 2016-07-03 19:27:06.221456+00 | 8950.00 |
Last rows
| df_index | maker | model | mileage | manufacture_year | engine_displacement | engine_power | body_type | color_slug | stk_year | transmission | door_count | seat_count | fuel_type | date_created | date_last_seen | price_eur | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 888218 | 3419461 | opel | astra | 176000.0 | NaN | NaN | NaN | other | NaN | 2019 | man | NaN | NaN | NaN | 2017-02-15 18:46:20.92016+00 | 2017-02-18 01:53:02.101921+00 | 1295.34 |
| 888219 | 2382921 | mercedes-benz | NaN | 269000.0 | 2012.0 | 1796.0 | 80.0 | other | NaN | NaN | auto | 4 | 5 | NaN | 2016-03-03 08:30:42.61805+00 | 2016-07-03 17:26:15.120508+00 | 9007.66 |
| 888220 | 3153466 | audi | 200 | 268000.0 | 2003.0 | NaN | NaN | other | NaN | NaN | NaN | NaN | NaN | electric | 2016-12-08 18:03:25.632454+00 | 2017-02-07 06:00:24.639179+00 | 1295.34 |
| 888221 | 2631653 | skoda | fabia | 13254.0 | 2012.0 | 1197.0 | 63.0 | other | NaN | NaN | man | 5 | 5 | NaN | 2016-03-08 18:44:41.233901+00 | 2016-07-03 17:51:25.907082+00 | 11450.00 |
| 888222 | 1950514 | renault | clio | 97000.0 | 2011.0 | 1461.0 | 55.0 | other | NaN | NaN | man | 5 | 5 | NaN | 2016-02-22 18:01:15.122266+00 | 2016-07-03 19:11:52.179141+00 | 7500.00 |
| 888223 | 1280030 | NaN | NaN | 168000.0 | 2006.0 | NaN | NaN | van | NaN | None | man | None | None | diesel | 2016-01-17 03:56:11.289348+00 | 2016-01-20 13:18:33.010844+00 | 4490.00 |
| 888224 | 1818013 | mercedes-benz | NaN | 108000.0 | 1991.0 | 1997.0 | 90.0 | other | NaN | NaN | man | 4 | 5 | NaN | 2016-02-19 07:02:39.933688+00 | 2016-07-03 18:26:21.561088+00 | 2991.12 |
| 888225 | 2482226 | ford | NaN | 2.0 | 2015.0 | 3198.0 | 147.0 | other | NaN | NaN | auto | 4 | 5 | NaN | 2016-03-05 03:43:50.498735+00 | 2016-07-03 17:36:26.70881+00 | 28140.53 |
| 888226 | 290990 | bmw | x1 | 65000.0 | 2010.0 | 1995.0 | 150.0 | NaN | NaN | None | man | 4 | 5 | diesel | 2015-12-02 03:15:58.497633+00 | 2015-12-14 04:46:05.466569+00 | 19902.22 |
| 888227 | 2146943 | maserati | ghibli | 17800.0 | 2015.0 | 2987.0 | 202.0 | other | NaN | NaN | auto | 4 | 5 | NaN | 2016-02-26 17:11:52.695165+00 | 2016-07-03 19:26:17.572593+00 | 55900.00 |